-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38424][planner] Support to parse VECTOR_SEARCH function #27039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...le-common/src/main/java/org/apache/flink/table/connector/source/VectorSearchTableSource.java
Outdated
Show resolved
Hide resolved
| * vectors during runtime. | ||
| * | ||
| * <p>Compared to {@link ScanTableSource}, the source does not have to read the entire table and can | ||
| * lazily fetch individual values from a (possibly continuously changing) external table when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to compare it to a LookupTableSource also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this part in the document.
| PARAM_SEARCH_TABLE, | ||
| PARAM_COLUMN_TO_SEARCH, | ||
| PARAM_COLUMN_TO_QUERY, | ||
| PARAM_TOP_K)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I an see what TOP_K is from googling. It would be useful to add in the documentation describing the parameters with this change - including the default. I wonder if we should add paging, to be able to handle a large number of results or is this not done with vector databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, the document will be added when all tasks are finished. The current API doesn't support paging, I think we can leave this as the future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fsk119 ! Left some comments
| throw new ValidationException( | ||
| "The query column is not literal, please use LATERAL TABLE to run VECTOR_SEARCH."); | ||
| } | ||
| SqlValidatorScope scope = getSelectScope((SqlSelect) binding.operand(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks there's no need to cast based on line 2618?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires. Because line 2618 uses SqlCall to get operand(its return type is <S extends SqlNode> S ), but here we uses SqlCallBinding to get operand(its return type is SqlNode). Therefore, we still need cast here. 0.0
We can not use SqlCall to extract operand because VECTOR_SEARCH allows user to use named argument, which means the operands is out of order and we need to use the name to reorder the operands.
| private static final String PARAM_SEARCH_TABLE = "SEARCH_TABLE"; | ||
| private static final String PARAM_COLUMN_TO_SEARCH = "COLUMN_TO_SEARCH"; | ||
| private static final String PARAM_COLUMN_TO_QUERY = "COLUMN_TO_QUERY"; | ||
| private static final String PARAM_TOP_K = "TOP_K"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing the optional config param from FLIP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But I plan to add this in the https://issues.apache.org/jira/browse/FLINK-38430
| Collections.singletonList(SqlKind.SELECT)))) { | ||
| boolean queryColumnIsNotLiteral = | ||
| binding.operand(2).getKind() != SqlKind.LITERAL; | ||
| if (!queryColumnIsNotLiteral && !lateral) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't lateral always needed? What's the syntax for literal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LATERAL is not always needed if the SqlCall doesn't contain correlation. For exmaple, users can use the following statement to search.
SELECT * FROM TABLE(VECTOR_SEARCH(TABLE VectorTable, DESCRIPTOR(`g`), ARRAY[1.5, 2.0], 10))
Here, the query input is ARRAY[1.5, 2.0].
CC
Line 95 in 06c61b8
| "SELECT * FROM TABLE(VECTOR_SEARCH(TABLE VectorTable, DESCRIPTOR(`g`), ARRAY[1.5, 2.0], 10))"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's the syntax but your test expects an exception, so it's not a valid sql?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a valid sql. But we don't add literal related rule in physical phase, so planner can not translate the sql correctly. But the exception indicates the planner can parse the statement correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I guess it's more clear to have something like https://github.com/apache/flink/pull/26553/files#diff-19970e8600e459e820e1310beed925a10450f695698257d85648e8114b5e5aaeR92 to indicate it's not invalid case.
What is the purpose of the change
Support to parse VECTOR_SEARCH function.
Brief change log
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes / no)Documentation